What is a social network?

A social network analysis describes all the individuals in a group AND all the links between them. The photo above shows a social network from the show The L Word. These are all individuals who have romantic and sexual connections to each other.

What do social network analyses tell us?

Social network analysis includes the theory, methods, and applications of social networks.

In this vignette, you will learn (1) how to prepare social network data for analysis and (2) how to perform quadratic assignment procedures to assess correlations between social networks (response variables) and factors that may influence networks (predictor variables).

A network analysis (as opposed to a non-network analysis) will reveal information about the group as a whole, as opposed to the individual. Looking back at our social network from The L Word, non-network analysis would only look at Shane’s romantic connections alone, but social network analysis can tell us Shane’s romantic connections in relation to Alice’s and Lauren’s, etc. It could also reveal group level characteristics that don’t exist for individuals. We are looking at the patterns of relationships between individuals. An edge is as much a network-object as a node is.

Terms to know

  • Nodes: the individuals in the social network

  • Edges: the relationships between individuals in a network

  • Directionality: some social networks will have a relationship that goes in one direction (e.g., Bonobo A grooms Bonobo B. Other relationships won’t have directionality, such as friendship). Directionality is relevant for understanding transactional or rank-based relationships.

  • Edge weight: the strength of the relationship between individuals, which may differ between individuals within social networks (e.g., if A gives B money, we might be interested the amount they gave. This value is the edge’s weight)

By the end of this module, you should be able to answer the following questions about social networks in R:
1) How are individuals connected in a group?
2) How can we test the extent to which specific factors influence whether individuals in a group are connected?
3) How do networks change over time?

To accomplish this, we will use social network datasets from nonhuman primates!

Installing packages

Here are the packages we’ll be using:

library(statnet)
library(igraph)
library(intergraph)

{igraph} and {statnet} are two of the most commonly used packages for social network analyses. Both include built-in functions that can plot networks and answer almost every beginner-level question about a given network. However, there may be some functions that are present in one package but not the other (or just work better). So, {intergraph} is another useful package that helps users transition between {igraph} and {statnet}.

Understanding the structure of social network data

Loading data

Let’s start by loading our data. We’ll start with a dataset about bonobos and their grooming partners and GG rubbing partners.

library(curl)
f <- curl("https://raw.githubusercontent.com/nmerullo/AN588_BonoboSpitChain/main/AnzaEtAl_2021_ggr_weighted.csv")
bonobo_ggr_edgelist <- read.csv(f, header = TRUE, sep = ",")
head(bonobo_ggr_edgelist) # data about bonobos' GG rubbing interactions
##   actor receiver GGR.Weight
## 1    ln       ky          4
## 2     h        j         20
## 3     j        h          4
## 4    ky       lc         15
## 5    lc       ky         20
## 6    lc       ln         31
q <- curl("https://raw.githubusercontent.com/nmerullo/AN588_BonoboSpitChain/main/AnzaEtAl_2021_grooming_weighted.csv")
bonobo_groom_edgelist <- read.csv(q, header = TRUE, sep = ",")
head(bonobo_groom_edgelist) # data about bonobos' grooming interactions
##   actor receiver weight
## 1     j       li      1
## 2    li        h      3
## 3     h        j      6
## 4     j        h      3
## 5     h       li      2
## 6    ln       dn      8

Setting up social network data

How we set-up and manage the social network data can impact the ease with which we navigate our social network analysis. Based on how many individuals there are, whether we know any of their properties (age, sex, etc), and information about the edges themselves (directionality, edge-weights, etc), we can store and input data either as a matrix or an edgelist.

Adjacency matrix

A matrix used for social network analysis is called a sociomatrix or an adjacency matrix. They are square matrices. For directed networks, rows indicate the starting node, and columns indicate the receiving node. Undirected networks are symmetric matrices around the diagonal. For un-weighted networks, the cells contain a 1, for the presence of an edge and a 0 for the absence of an edge between individuals. For weighted networks, the cells can contain the edge-weights instead of 1s and 0s.

Edge list

Alternatively, we can also represent network data as a columned list, called an edge list. Each row of the list represents one edge (one relationship between individuals). In directed data, the first column names the sender of the edge and the second column names the receiver. In undirected data, it does not matter which node is listed first. For weighted data, a third column contains a numeric value for each tie.

Is our bonobo dataset structured as an edge list or adjacency matrix?

Node-Level Variables

To include variables that describe your nodes (age, rank, sex, etc), it is good practice to load in a separate file. This data frame must include one column of the node IDs using the exact same IDs used in the adjacency matrix or edge list. All other variables should then be included in additional columns.

j <- curl("https://raw.githubusercontent.com/nmerullo/AN588_BonoboSpitChain/main/AnzaEtAl_2021_attributes.csv")
bonobo_attribute <- read.csv(j, header = TRUE, sep = ",")
head(bonobo_attribute)
##   actor rank age
## 1    ln 3.42   9
## 2     h 3.42  31
## 3     j 6.20  24
## 4    ky 4.66  11
## 5    lc 4.11   9
## 6    dn 5.57  44

Now we can create the network objects and add node attributes to the objects. Note that adding note attributes to objects involves ‘%v%’, which resembles a pipe. This functions similarly to a $ operator and allows us to add rank variables to a dataset.

bonobo_ggr_net <- as.network.data.frame(bonobo_ggr_edgelist) # creates GG rubbing network 
bonobo_groom_net<-as.network.data.frame(bonobo_groom_edgelist) # creates grooming network

bonobo_ggr_net %v% "rank" <- bonobo_attribute$rank #adds rank variable to GG rubbing network
bonobo_ggr_net %v% "age" <- bonobo_attribute$age #adds age variable to GG rubbing network
bonobo_groom_net %v% "rank" <- bonobo_attribute$rank #adds rank variable to grooming network
bonobo_groom_net %v% "age" <- bonobo_attribute$age #adds age variable to grooming network

Visualizing social network data

We can visualize social networks by creating network objects with {statnet}. Note the arrows indicating directionality!

par(mfrow = c(1, 2))
plot(bonobo_ggr_net, label = "vertex.names", main="GG rubbing")
box(col = "black")
plot(bonobo_groom_net, label = "vertex.names", main="Grooming")
box(col = "black")

Quadratic Assignment Procedure (QAP)

When analyzing social network data, we want to know to what degree our variables impact the formation of a particular network organization. Our independent variables or predictor variables are node level variables (e.g. age) or similarity between node level variables (e.g. whether pairs belong to the same age category). Our dependent variable is the organization of edges in the network.

We cannot use standard OLS regressions to analyze pair characteristics and the presence of edges between pairs because it violates the assumption that observations are independent. For example, observations A-B, A-C, and A-D are not independent because they all involve individual A. Or if one bonobo is GG rubbing with 5 other bonobos, those 5 other bonobos may also be GG rubbing with each other. Quadratic assignment procedure (QAP) addresses this issue of non-independence.

QAP is a resampling-based method that controls for non-independence in network structure via random permutations. These permutations use different arrangements of the rows and columns in the adjacency matrix. Thus, the network structure is maintained but the arrangement of individuals in the structure is randomized. This represents the null hypothesis because it should eliminate any potential correlations between ties and independent values.

There are 3 types of QAP regressions we can run with {statnet}: correlation regressions, linear regressions, and multiple regressions. We will describe and demonstrate each regression using the bonobo data.

1. Correlation

Correlation QAPs simply test whether two social networks are correlated. For example, using our bonobo datasets, we can ask: Do bonobos who groom each other correlate with bonobos who GG rub with one another?

We can start by using the gcor() function to get a correlation coefficient between the two networks.

gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054

These networks don’t seem very correlated but let’s test for significance anyways.

qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
                gcor, # the function you're using is correlation between networks (gcor) 
                g1=1, # use first graph in the list
                g2=2, # use second graph in the list
                reps = 1000) # number of permutations to run
summary(qap_cor)
## 
## QAP Test Results
## 
## Estimated p-values:
##  p(f(perm) >= f(d)): 0.301 
##  p(f(perm) <= f(d)): 0.842 
## 
## Test Diagnostics:
##  Test Value (f(d)): 0.1111054 
##  Replications: 1000 
##  Distribution Summary:
##      Min:     -0.4191705 
##      1stQ:    -0.1010049 
##      Med:     0.005050247 
##      Mean:    -0.004812886 
##      3rdQ:    0.1111054 
##      Max:     0.8534918

[DESCRIBE/INTERPRET OUTPUT]

Should we plot these and explain them? Is that helpful at all? -sam

2. Multiple regression (weighted)

Multiple regression QAPs test the extent to which predictor variables are affecting edge weights.
For example, if our predictor variable is rank, we can ask: Are high-ranking or low-ranking individuals more tied through GG rubbing? In other words, we are asking: how many times are you likely to have more edges for each unit increase in rank?

Predictor variables must be matrices in order to be used in the QAP functions. We can look at whether rank affects the weight of ties sent or received (or both, but they must be added to the model separately).

First, we must create a matrix that shows the rank of senders. Note that senders are represented by ROWS in the matrices.

nodes <- 7 # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in

rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the age of each actor is repeated over entire ROW of matrix
rank_sending
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23

Next, we must create a matrix that shows the rank of receivers. Note that receivers are represented by COLUMNS in the matrices.

rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the age of each actor is repeated over entire COLUMN of matrix
rank_receiving
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23

Now we can run our multiple regression! First we are asking: Does rank affect how many times bonobos G-G rubbed others (the weight of the edges they sent)?

mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
                list(rank_sending)) # list of predictor variables as network or matrix objects
summary(mrqap)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.4534205 -0.3093578 -0.2036344  0.5465795  0.8695587 
## 
## Coefficients:
##             Estimate   Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.8507548 0.996   0.004   0.005    
## x1          -0.1161796 0.014   0.986   0.034    
## 
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227  Adjusted R-squared: 0.03883 
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)       x1
## Min       -2.52845 -1.88290
## 1stQ      -0.56955 -0.55450
## Median     0.14564  0.01265
## Mean       0.11505  0.02629
## 3rdQ       0.83359  0.52868
## Max        2.61709  1.87439

[DESCRIBE/INTERPRET OUTPUT]

Does rank affect how many times bonobos received G-G rubbing from others (the weight of the edges they receive)?

mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
                list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(mrqap)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.3992884 -0.3393159 -0.2434687  0.6007116  0.8021901 
## 
## Coefficients:
##             Estimate    Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.64715036 0.992   0.008   0.011    
## x1          -0.07247427 0.049   0.951   0.093    
## 
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423  Adjusted R-squared: -0.0001605 
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)        x1
## Min      -2.116679 -1.233718
## 1stQ     -0.505828 -0.507257
## Median    0.064646 -0.007029
## Mean      0.073798 -0.020126
## 3rdQ      0.664057  0.479828
## Max       2.053671  1.347807

[DESCRIBE/INTERPRET OUTPUT]

We can include multiple predictor variables! We can also use another network as a predictor variable! For example, we can ask: Do sender rank and grooming relationships predict the number of times bonobos G-G rubbed in the network?

mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
                list(rank_receiving, bonobo_groom_net)) # list of all predictor variables as network or matrix objects

summary(mrqap)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.4834824 -0.3406152 -0.2400259  0.5709808  0.8093067 
## 
## Coefficients:
##             Estimate    Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  0.63034548 0.992   0.008   0.010    
## x1          -0.07893218 0.030   0.970   0.071    
## x2           0.12308498 0.787   0.213   0.479    
## 
## Residual standard error: 0.4699 on 39 degrees of freedom
## Multiple R-squared: 0.04076  Adjusted R-squared: -0.008433 
## F-statistic: 0.8286 on 2 and 39 degrees of freedom, p-value: 0.4442 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)       x1       x2
## Min       -1.95987 -1.28828 -2.89829
## 1stQ      -0.53884 -0.37780 -0.89607
## Median     0.04723  0.02143 -0.07872
## Mean       0.04834  0.02077  0.08003
## 3rdQ       0.62551  0.45594  0.72284
## Max        1.98692  1.57047 10.38170

[DESCRIBE/INTERPRET OUTPUT]

3. Linear regression

Linear regression QAPs test the extent to which independent variables are affecting the presence or absence of edges.
Using our bonobo dataset, we can ask: Does older age make an individual more or less likely to G-G rub with other individuals?
In this example, our grooming network is the response variable while age is our predictor variable.

Similar to the multiple regression, we must first create a matrix that shows the rank of senders and receivers. As a reminder, senders are represented by ROWS while receivers are represented by COLUMNS in matrices

nodes <- 7 # number of nodes in the data set
age <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we're interested in

age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create empty matrix to be filled
age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
age_sending
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
for (i in 1:nodes){ 
age_receiving[,i] <- rep(age[i], nodes)} 
age_receiving
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42  6.2 4.66 4.11 5.57 5.23

Now we can run our linear regression!

lrqap <- netlogit(bonobo_ggr_net, # response variable is a network object with an unweighted adjacency matrix
                  list(age_receiving, age_sending)) # list of all predictor variables as network or matrix objects
summary(lrqap)
## 
## Network Logit Model
## 
## Coefficients:
##             Estimate   Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept)  4.5524763 94.8670398 0.971   0.029   0.055    
## x1          -0.4903676  0.6124013 0.030   0.970   0.069    
## x2          -0.6817886  0.5057117 0.009   0.991   0.026    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
##   10.77041 on 3 degrees of freedom, p-value 0.01303442 
## AIC: 53.45396    BIC: 58.66696 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.2040994 
##  (Dn-Dr)/Dn: 0.1849811 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##          Actual
## Predicted    0    1
##         0   25   11
##         1    4    2
## 
##  Total Fraction Correct: 0.6428571 
##  Fraction Predicted 1s Correct: 0.3333333 
##  Fraction Predicted 0s Correct: 0.6944444 
##  False Negative Rate: 0.8461538 
##  False Positive Rate: 0.137931 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)       x1       x2
## Min       -2.04329 -1.57284 -2.01549
## 1stQ      -0.60970 -0.48545 -0.71649
## Median     0.08668  0.05638 -0.06637
## Mean       0.06877  0.04245 -0.04359
## 3rdQ       0.74504  0.62798  0.53998
## Max        2.04528  1.67968  2.02638

In the output above, x1 represents the first variable in the model (age_receiving), and x2 represents the second variable in the model (age_sending). [DESCRIBE/INTERPRET OUTPUT]

Spider Monkey Social Networks

[Something here about why we’re doing this again with spider monkey data, maybe describe spider monkeys as a species a little]

First, we must load our data.

spmonkey_dat <- curl("https://raw.githubusercontent.com/nmerullo/AN588_BonoboSpitChain/main/spidermonkey_beh_edgelist.csv")
spmonkey_el <- read.csv(spmonkey_dat)
head(spmonkey_el)
##   Focal        Beh Rec Mins
## 1    Ba In contact  Bo  262
## 2    Ba In contact  Db    1
## 3    Ba In contact  Dl  134
## 4    Ba In contact  Ku    6
## 5    Ba In contact  Nw   10
## 6    Ba In contact  Pe   37
unique(spmonkey_el$Beh)
## [1] "In contact" "mating"     "play"       "play "      "groom"

We see here that spider monkeys are engaging in four unique social behaviors in this dataset: in contact, mating, playing, and grooming. This means that we can construct four separate social networks. Additionally, the packages that we use can only read edge lists that are in the form of a 2-column format. So, let’s clean this data up a bit more.

First, we are starting with the mating network.

mating_spmonkey <- spmonkey_el[spmonkey_el$Beh == "mating",] #making a separate data frame for mating
mating_spmonkey <- mating_spmonkey[,-c(2,4)] #removing unnecessary data, we'll add back this "Mins" data as edge attributes later 
sp_mating_net <- as.network(mating_spmonkey, matrix.type = "edgelist", directed = TRUE)
plot(sp_mating_net, label = "vertex.names", main="Spider Monkey Mating Network")

Next, let’s add network and edge attributes to the network.

sp_attr <- curl("https://raw.githubusercontent.com/nmerullo/AN588_BonoboSpitChain/main/spidermonkey_attributes.csv")
sp_attr <- read.csv(sp_attr)
head(sp_attr)
##   Number ID    sex   age dead.alive Repr_status
## 1      1 Pe female adult       dead   preg/lact
## 2      2 Ku female adult       dead   preg/lact
## 3      3 Dl female adult       dead   preg/lact
## 4      4 Ba female adult       dead   preg/lact
## 5      5 Vi female adult       dead   preg/lact
## 6      6 Nw   male adult      alive repr_active
sp_attr$age[sp_attr$age == "infant"] <- 1
sp_attr$age[sp_attr$age == "juvenile"] <- 2
sp_attr$age[sp_attr$age == "adult"] <- 3
sp_attr$age <- as.numeric(sp_attr$age)
class(sp_attr$age)
## [1] "numeric"
sp_attr$sex[sp_attr$sex == "female"] <- 0
sp_attr$sex[sp_attr$sex == "male"] <- 1
sp_attr$sex <- as.numeric(sp_attr$sex)
class(sp_attr$sex)
## [1] "numeric"
sp_mating_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_mating_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_mating_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute

We can also add the “Mins” from the initial spider monkey edge list as edge weights.

sp_mating_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "mating",]$Mins 

This process can be repeated for grooming, play and contact networks.

play_spmonkey <- spmonkey_el[spmonkey_el$Beh == "play",][,-c(2,4)] #making a separate data frame for play behavior
sp_play_net_unweighted <- as.network(play_spmonkey, matrix.type = "edgelist", directed = TRUE, weighted=FALSE)
plot(sp_play_net_unweighted, label = "vertex.names", main="Spider Monkey Play Network")

# Adding network and edge attributes to network
sp_play_net_unweighted %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_play_net_unweighted %v% "age" <- sp_attr$age #adds age as node attribute
sp_play_net_unweighted %v% "repr" <- sp_attr$Repr_status #adds reproductive status as node attribute
groom_spmonkey <- spmonkey_el[spmonkey_el$Beh == "groom",][,-c(2,4)] #making a separate data frame for grooming behavior
sp_groom_net <- as.network(groom_spmonkey, matrix.type = "edgelist", directed = TRUE, loops = TRUE)
plot(sp_groom_net, label = "vertex.names", main="Spider Monkey Grooming Network")

# Adding network and edge attributes to the network
sp_groom_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_groom_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_groom_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute

# Let's add the "Mins" from the initial spmonkey edgelist as edge weights
sp_groom_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "groom",]$Mins 
contact_spmonkey <- spmonkey_el[spmonkey_el$Beh == "In contact",][,-c(2,4)] #making a separate data frame for contact
sp_contact_net <- as.network(contact_spmonkey, matrix.type = "edgelist", directed = TRUE, loops = TRUE)
plot(sp_contact_net, label = "vertex.names", main="Spider Monkey Contact Network")

# Adding network and edge attributes to the network
sp_contact_net %v% "sex" <- sp_attr$sex #adds sex as node attribute
sp_contact_net %v% "age" <- sp_attr$age #adds age as node attribute
sp_contact_net %v% "repr" <- sp_attr$Repr_status #adds reproductive status as a node attribute

# Let's add the "Mins" from the initial spmonkey edgelist as edge weights
sp_contact_net %e% "time" <- spmonkey_el[spmonkey_el$Beh == "In contact",]$Mins 

Now that we have loaded, cleaned, and visualized our spider monkey dataset, we can analyze it via QAP regressions!

QAPs for Spider Monkey Data

Correlation

Does grooming frequency correlate with contact frequency in spider monkeys?

# get the correlation value
gcor(sp_groom_net, sp_contact_net)
## [1] 0.362977
# is it significant?
qap_cor <- qaptest(list(sp_groom_net, sp_contact_net), # include both network objects in a list
                gcor, # the function you're using is correlation between networks (gcor) 
                g1=1, # use first graph in the list
                g2=2, # use second graph in the list
                reps = 1000) # number of permutations to run
summary(qap_cor)
## 
## QAP Test Results
## 
## Estimated p-values:
##  p(f(perm) >= f(d)): 0.002 
##  p(f(perm) <= f(d)): 0.999 
## 
## Test Diagnostics:
##  Test Value (f(d)): 0.362977 
##  Replications: 1000 
##  Distribution Summary:
##      Min:     -0.3266793 
##      1stQ:    -0.09074425 
##      Med:     0 
##      Mean:    -0.003956449 
##      3rdQ:    0.09074425 
##      Max:     0.4718701

Multiple regression (weighted)

Create a matrix that shows the rank of senders. NOTE: Senders are represented by ROWS in the matrices

nodes <- 17 # the number of nodes in the data set
age <- sp_groom_net %v% "age" # a vector of the node-level variable we are interested in

age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create an empty matrix to be filled

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
age_sending
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [2,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [3,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [4,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [5,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [6,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [7,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [8,]    3    3    3    3    3    3    3    3    3     3     3     3     3
##  [9,]    3    3    3    3    3    3    3    3    3     3     3     3     3
## [10,]    3    3    3    3    3    3    3    3    3     3     3     3     3
## [11,]    3    3    3    3    3    3    3    3    3     3     3     3     3
## [12,]    2    2    2    2    2    2    2    2    2     2     2     2     2
## [13,]    2    2    2    2    2    2    2    2    2     2     2     2     2
## [14,]    2    2    2    2    2    2    2    2    2     2     2     2     2
## [15,]    1    1    1    1    1    1    1    1    1     1     1     1     1
## [16,]    2    2    2    2    2    2    2    2    2     2     2     2     2
## [17,]    1    1    1    1    1    1    1    1    1     1     1     1     1
##       [,14] [,15] [,16] [,17]
##  [1,]     3     3     3     3
##  [2,]     3     3     3     3
##  [3,]     3     3     3     3
##  [4,]     3     3     3     3
##  [5,]     3     3     3     3
##  [6,]     3     3     3     3
##  [7,]     3     3     3     3
##  [8,]     3     3     3     3
##  [9,]     3     3     3     3
## [10,]     3     3     3     3
## [11,]     3     3     3     3
## [12,]     2     2     2     2
## [13,]     2     2     2     2
## [14,]     2     2     2     2
## [15,]     1     1     1     1
## [16,]     2     2     2     2
## [17,]     1     1     1     1

Create a matrix that shows the rank of receivers NOTE: Receivers are represented by COLUMNS in the matrices

age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_receiving[,i] <- rep(age[i], nodes)} # The age of each actor is repeated over entire COLUMN of matrix
age_receiving
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [2,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [3,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [4,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [5,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [6,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [7,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [8,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##  [9,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [10,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [11,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [12,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [13,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [14,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [15,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [16,]    3    3    3    3    3    3    3    3    3     3     3     2     2
## [17,]    3    3    3    3    3    3    3    3    3     3     3     2     2
##       [,14] [,15] [,16] [,17]
##  [1,]     2     1     2     1
##  [2,]     2     1     2     1
##  [3,]     2     1     2     1
##  [4,]     2     1     2     1
##  [5,]     2     1     2     1
##  [6,]     2     1     2     1
##  [7,]     2     1     2     1
##  [8,]     2     1     2     1
##  [9,]     2     1     2     1
## [10,]     2     1     2     1
## [11,]     2     1     2     1
## [12,]     2     1     2     1
## [13,]     2     1     2     1
## [14,]     2     1     2     1
## [15,]     2     1     2     1
## [16,]     2     1     2     1
## [17,]     2     1     2     1

Is age category predictive of time spent in grooming behavior within dyads of spider monkeys?

mrqap <- netlm(sp_groom_net, # response variable is a network object with a weighted adjacency matrix
                list(age_sending)) # list of all predictor variables as network objects or matrix objects

summary(mrqap)
## 
## OLS Network Model
## 
## Residuals:
##         0%        25%        50%        75%       100% 
## -0.3642857 -0.3642857 -0.1214286  0.2500000  0.6357143 
## 
## Coefficients:
##             Estimate   Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -0.3642857 0.045   0.955   0.097    
## x1           0.2428571 1.000   0.000   0.007    
## 
## Residual standard error: 0.4001 on 270 degrees of freedom
## Multiple R-squared: 0.1524   Adjusted R-squared: 0.1492 
## F-statistic: 48.54 on 1 and 270 degrees of freedom, p-value: 2.472e-11 
## 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Coefficient Distribution Summary:
## 
##        (intercept)       x1
## Min       -6.05463 -9.49060
## 1stQ      -2.03562 -1.81499
## Median    -0.43093  0.20046
## Mean      -0.27574 -0.01849
## 3rdQ       1.41907  2.01954
## Max        8.32077  6.71228

Linear regression

Does reproductive status predict mating behavior (number of ties in mating social network)?

nodes <- 16 # the number of nodes in the data set
sex <- sp_play_net_unweighted %v% "sex" # a vector of the node-level variable we are interested in

sex_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create an empty matrix to be filled

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
sex_sending[i,] <- rep(sex[i], nodes)} # age of each actor is repeated over entire ROW of matrix
sex_sending
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    0    0    0    0    0    0    0    0    0     0     0     0     0
##  [2,]    0    0    0    0    0    0    0    0    0     0     0     0     0
##  [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0
##  [4,]    0    0    0    0    0    0    0    0    0     0     0     0     0
##  [5,]    0    0    0    0    0    0    0    0    0     0     0     0     0
##  [6,]    1    1    1    1    1    1    1    1    1     1     1     1     1
##  [7,]    1    1    1    1    1    1    1    1    1     1     1     1     1
##  [8,]    1    1    1    1    1    1    1    1    1     1     1     1     1
##  [9,]    1    1    1    1    1    1    1    1    1     1     1     1     1
## [10,]    1    1    1    1    1    1    1    1    1     1     1     1     1
## [11,]    1    1    1    1    1    1    1    1    1     1     1     1     1
## [12,]    0    0    0    0    0    0    0    0    0     0     0     0     0
## [13,]    0    0    0    0    0    0    0    0    0     0     0     0     0
## [14,]    0    0    0    0    0    0    0    0    0     0     0     0     0
## [15,]    0    0    0    0    0    0    0    0    0     0     0     0     0
## [16,]    0    0    0    0    0    0    0    0    0     0     0     0     0
##       [,14] [,15] [,16]
##  [1,]     0     0     0
##  [2,]     0     0     0
##  [3,]     0     0     0
##  [4,]     0     0     0
##  [5,]     0     0     0
##  [6,]     1     1     1
##  [7,]     1     1     1
##  [8,]     1     1     1
##  [9,]     1     1     1
## [10,]     1     1     1
## [11,]     1     1     1
## [12,]     0     0     0
## [13,]     0     0     0
## [14,]     0     0     0
## [15,]     0     0     0
## [16,]     0     0     0

Create a matrix that shows the rank of receivers NOTE: Receivers are represented by COLUMNS in the matrices

sex_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)

for (i in 1:nodes){ # for 1 through the number of nodes in the data set
sex_receiving[,i] <- rep(sex[i], nodes)} # age of each actor is repeated over entire COLUMN of matrix
sex_receiving
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [2,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [3,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [4,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [5,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [6,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [7,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [8,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##  [9,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [10,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [11,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [12,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [13,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [14,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [15,]    0    0    0    0    0    1    1    1    1     1     1     0     0
## [16,]    0    0    0    0    0    1    1    1    1     1     1     0     0
##       [,14] [,15] [,16]
##  [1,]     0     0     0
##  [2,]     0     0     0
##  [3,]     0     0     0
##  [4,]     0     0     0
##  [5,]     0     0     0
##  [6,]     0     0     0
##  [7,]     0     0     0
##  [8,]     0     0     0
##  [9,]     0     0     0
## [10,]     0     0     0
## [11,]     0     0     0
## [12,]     0     0     0
## [13,]     0     0     0
## [14,]     0     0     0
## [15,]     0     0     0
## [16,]     0     0     0
lrqap <- netlogit(sp_play_net_unweighted, # response variable is a network object with an unweighted adjacency matrix
                   list(sex_receiving, sex_sending)) # list of predictor variables as network or matrix objects
summary(lrqap)
## 
## Network Logit Model
## 
## Coefficients:
##             Estimate   Exp(b)     Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) -2.4875920 0.08310985 0.000   1.000   0.000    
## x1           0.4783349 1.61338572 0.896   0.104   0.211    
## x2           0.1100466 1.11633005 0.562   0.438   0.859    
## 
## Goodness of Fit Statistics:
## 
## Null deviance: 332.7106 on 240 degrees of freedom
## Residual deviance: 150.4112 on 237 degrees of freedom
## Chi-Squared test of fit improvement:
##   182.2994 on 3 degrees of freedom, p-value 0 
## AIC: 156.4112    BIC: 166.8531 
## Pseudo-R^2 Measures:
##  (Dn-Dr)/(Dn-Dr+dfn): 0.4316829 
##  (Dn-Dr)/Dn: 0.5479219 
## Contingency Table (predicted (rows) x actual (cols)):
## 
##       0     1
## 0   217    23
## 1     0     0
## 
##  Total Fraction Correct: 0.9041667 
##  Fraction Predicted 1s Correct: NaN 
##  Fraction Predicted 0s Correct: 0.9041667 
##  False Negative Rate: 1 
##  False Positive Rate: 0 
## 
## Test Diagnostics:
## 
##  Null Hypothesis: qap 
##  Replications: 1000 
##  Distribution Summary:
## 
##        (intercept)         x1         x2
## Min     -6.676e+00 -2.051e+00 -2.690e+00
## 1stQ    -5.254e+00 -6.338e-01 -1.111e+00
## Median  -4.616e+00  3.775e-15  9.107e-02
## Mean    -4.376e+00 -1.450e-02  8.919e-02
## 3rdQ    -3.738e+00  6.039e-01  1.085e+00
## Max      3.296e-02  2.343e+00  4.163e+00

References

[…]